Less is More: Non-Redundant Subspace Clustering

نویسندگان

  • Ira Assent
  • Emmanuel Müller
  • Stephan Günnemann
  • Thomas Seidl
چکیده

Clustering is an important data mining task for grouping similar objects. In high dimensional data, however, effects attributed to the “curse of dimensionality”, render clustering in high dimensional data meaningless. Due to this, recent years have seen research on subspace clustering which searches for clusters in relevant subspace projections of high dimensional data. As the number of possible subspace projections is exponential in the number of dimensions, the number of possible subspace clusters can be overwhelming. In this position paper, we present our work on identifying non-redundant, relevant subspace clusters which reduce the result set to a manageable size. We discuss techniques for evaluating, visualizing and exploring subspace clusterings, and propose some directions for future work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Method for Finding Closed Subspace Clusters for High Dimensional Data

Subspace clustering tries to find groups of similar objects from the given dataset such that the objects are projected on only a subset of the feature space. It finds meaningful clusters in all possible subspaces. However, when it comes to the quality of the resultant subspace clusters most of the subspace clusters are redundant. These redundant subspace clusters don’t provide new information. ...

متن کامل

ASCLU: Alternative Subspace Clustering

Finding groups of similar objects in databases is one of the most important data mining tasks. Recently, traditional clustering approaches have been extended to generate alternative clustering solutions. The basic observation is that for each database object multiple meaningful groupings might exist: the data allows to be clustered through different perspectives. It is thus reasonable to search...

متن کامل

Subspace clustering for complex data

Clustering is an established data mining technique for grouping objects based on their mutual similarity. Since in today’s applications, however, usually many characteristics for each object are recorded, one cannot expect to find similar objects by considering all attributes together. In contrast, valuable clusters are hidden in subspace projections of the data. As a general solution to this p...

متن کامل

Subspace clustering of gene expression data with prior knowledge

The subspace clustering such as Biclustering has been researched for finding genes activated on specific conditions or specific cell cycles, whose gene expression levels are highly correlated only under the active conditions.However, existing methods have problems of the lack of cluster reliability caused from over-fitting and the difficulty to interpret the clusters because of the generation o...

متن کامل

Feature Selection based Semi-Supervised Subspace Clustering

Clustering is the process which is used to assign a set of n objects into clusters(groups). Dimensionality reduction techniques help in increasing the accuracy of clustering results by removing redundant and irrelevant dimensions. But, in most of the situations, objects can be related in different ways in different subsets of the dimensions. Dimensionality reduction tends to get rid of such rel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010